A Flexible Measure of Contextual Similarity for Biomedical Terms

نویسندگان

  • Irena Spasic
  • Sophia Ananiadou
چکیده

We present a measure of contextual similarity for biomedical terms. The contextual features need to be explored, because newly coined terms are not explicitly described and efficiently stored in biomedical ontologies and their inner features (e.g. morphologic or orthographic) do not always provide sufficient information about the properties of the underlying concepts. The context of each term can be represented as a sequence of syntactic elements annotated with biomedical information retrieved from an ontology. The sequences of contextual elements may be matched approximately by edit distance defined as the minimal cost incurred by the changes (including insertion, deletion and replacement) needed to transform one sequence into the other. Our approach augments the traditional concept of edit distance by elements of linguistic and biomedical knowledge, which together provide flexible selection of contextual features and their comparison.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybrid approach combining contextual and statistical information for identifying MEDLINE citation terms

There is a strong demand for developing automated tools for extracting pertinent information from the biomedical literature that is a rich, complex, and dramatically growing resource, and is increasingly accessed via the web. This paper presents a hybrid method based on contextual and statistical information to automatically identify two MEDLINE citation terms: NIH grant numbers and databank ac...

متن کامل

A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations

Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomed...

متن کامل

A corpus based approach to find similar keywords for search engine marketing

Automatic thesaurus generation is used by search engines for query expansion. The same concept is used by search engine marketing companies to suggest keyword terms to their clients to improve the client’s ratings for different search engines. This paper presents and evaluates a corpus based method to find similar terms. The corpus is generated by scraping websites in different categories. A fe...

متن کامل

Using automatically learnt verb selectional preferences for classification of biomedical terms

In this paper, we present an approach to term classification based on verb selectional patterns (VSPs), where such a pattern is defined as a set of semantic classes that could be used in combination with a given domain-specific verb. VSPs have been automatically learnt based on the information found in a corpus and an ontology in the biomedical domain. Prior to the learning phase, the corpus is...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2005